A shared data cluster consists of multiple independent servers sharing a set of common disk drives over a network. In a production deployment the disk sharing will be done over a separate network from the one used for general communications between the servers and clients. Until recently this separate Storage Area Network (SAN) was assembled using expensive Fibre Channel network adapters and switches. Recently it has become possible to assemble a SAN using standard ethernet adapters and switches.
A shared data cluster with a topology like that shown below provides numerous benefits when deploying an application like the Genezzo database. These benefits are in the areas of scalability, high availability, and most recently affordability. Several of these benefits arise because every server can equally access every shared disk. The cluster scales in compute power by adding servers, and in capacity by adding disks. The data stored by the cluster is highly available because single server failures don't prevent access to any of the data, and because the SAN disks can easily be set up in a RAID configuration to guard against disk failures. Finally, clusters built with commodity AMD/Intel processors, ATA/SATA hard drives, and ethernet adapters and switches are very affordable. The major drawback to shared data clusters is the greater complexity of the operating system(s) and applications required to utilize it. The holy grail at both the operating system and database level is the "single system image" the illusion presented to the higher level applications and the users that the cluster is a single monolithic system and the underlying hardware complexity can safely be ignored.
AoE protocol support has been added to the latest Linux 2.6.11 kernel. Coraid provides open-source drivers for earlier 2.6 kernels and 2.4 kernels (2.4 kernel drivers only support a single partition per disk drive) and FreeBSD. On SourceForge aoetools (command-line utilities) and vblade (virtual AoE disk emulator) are available.
My current cluster configuration consists of 2 AMD Athlon servers (running Debian Linux with 2.6.11 kernels), one Intel Celeron server (running Debian Linux with a 2.6.12 kernel), and a single shared 200G ATA hard drive on the EtherDrive storage blade. The 2 servers and the blade are all connected through the same 100Mbit ethernet switch. I don't currently have a separate dedicated ethernet SAN connection for the blade. In a production environment the servers would have a second ethernet adapter connected to a separate ethernet switch for connectivity to the blade(s). This would provide dedicated bandwidth. It would also provide security since AoE is a OSI layer 2 non-routed protocol.
I have modified a pre-release version of Genezzo to use a raw device on the shared EtherDrive blade. Genezzo running on both servers is able to read and write this shared raw device.
modprobe raw modprobe aoeIf your kernel does not have the aoe module download the current AoE driver from Coraid and install it.
Alternately fdisk /dev/etherd/e0.0 can be used to create partitions after the drive is available over AoE.
mknod /dev/rawctl c 162 0 mkdir /dev/raw mknod /dev/raw/raw1 c 162 1 mknod /dev/raw/raw2 c 162 2 mknod /dev/raw/raw3 c 162 3This step may be necessary because later versions of Linux have deprecated raw devices in favor of the use of the O_DIRECT flag with the open(2) system call.
<soapbox> O_DIRECT does not appear to be supported in Perl. And the open(2) man page BUGS section says "The thing that has always disturbed me about O_DIRECT is that the whole interface is just stupid, and was probably designed by a deranged monkey on some serious mind-controlling substances." -- Linus </soapbox>
aoe-mkdevs /dev/etherd
e0.0 eth0 up
e0.0 will differ if you have your EtherDrive set to a different
shelf and slot number via the DIP switches.
0.0 is the default; alter the following command(s) as necessary.
Now do ls -l /dev/etherd/e0.0*
brw-rw---- 1 root root 152, 0 Jun 23 20:30 /dev/etherd/e0.0 brw-rw---- 1 root root 152, 1 Jun 23 20:30 /dev/etherd/e0.0p1 brw-rw---- 1 root root 152, 10 Jun 23 20:30 /dev/etherd/e0.0p10 brw-rw---- 1 root root 152, 11 Jun 23 20:30 /dev/etherd/e0.0p11 brw-rw---- 1 root root 152, 12 Jun 23 20:30 /dev/etherd/e0.0p12 brw-rw---- 1 root root 152, 13 Jun 23 20:30 /dev/etherd/e0.0p13 brw-rw---- 1 root root 152, 14 Jun 23 20:30 /dev/etherd/e0.0p14 brw-rw---- 1 root root 152, 15 Jun 23 20:30 /dev/etherd/e0.0p15 brw-rw---- 1 root root 152, 2 Jun 23 20:30 /dev/etherd/e0.0p2 brw-rw---- 1 root root 152, 3 Jun 23 20:30 /dev/etherd/e0.0p3 brw-rw---- 1 root root 152, 4 Jun 23 20:30 /dev/etherd/e0.0p4 brw-rw---- 1 root root 152, 5 Jun 23 20:30 /dev/etherd/e0.0p5 brw-rw---- 1 root root 152, 6 Jun 23 20:30 /dev/etherd/e0.0p6 brw-rw---- 1 root root 152, 7 Jun 23 20:30 /dev/etherd/e0.0p7 brw-rw---- 1 root root 152, 8 Jun 23 20:30 /dev/etherd/e0.0p8 brw-rw---- 1 root root 152, 9 Jun 23 20:30 /dev/etherd/e0.0p9Use the raw command to bind the partitions to raw devices. For the first three partitions use
raw /dev/raw/raw1 /dev/etherd/e0.0p1 raw /dev/raw/raw2 /dev/etherd/e0.0p2 raw /dev/raw/raw3 /dev/etherd/e0.0p3To see the results use raw -qa
/dev/raw/raw1: bound to major 152, minor 1 /dev/raw/raw2: bound to major 152, minor 2 /dev/raw/raw3: bound to major 152, minor 3Warning: Do NOT use /dev/etherd/e0.0 unless you are running without partitions or want to overwrite the disk partition table with your Genezzo database. Also, if you change the partitioning on any hard drive reboot all servers so they will recognize the partitioning change.
erollins@gigabyte:~$ gendba.pl -gnz_home=/dev/raw loading... setUseRaw(1) automount = TRUE automounting... Genezzo Version 0.42 - Alpha 20050617 (www.genezzo.com) Copyright (c) 2003, 2004, 2005 Jeffrey I Cohen. All rights reserved. Type "SHOW" to obtain license information. gendba 1>
Obtain the code here, and follow the directions here
I also installed the following Debian packages: libnet, libnet-dev, heartbeat, heartbeat-dev, glib, glib-dev, uuid, uuid-dev, libpils, kernel-headers-### (or linux-headers-###).
I skipped the bootstrap command (available in CVS, not in 0.9.2 or 0.9.3).
for me OpenDLM version 0.9.2 only compiled on Linux 2.4. I used the following configure command:
./configure --with-linux-srcdir=/usr/src/kernel-headers-2.4.27-2-386 --with-heartbeat_library=/usr/lib/libhbclient.so --with-heartbeat_includes=/usr/include/heartbeatfor me OpenDLM version 0.9.3 only compiled on Linux 2.6. I used the following configure command:
./configure --with-heartbeat --with-linux-srcdir=/usr/src/kernel-headers-2.6.11-1-k7on Linux 2.6 editing /etc/modules.conf did not help. Instead
cd /lib/modules/2.6.11-1-k7/extra ln -s dlmdk_core.ko haDLM.ko depmod -a